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(54) SPEECH RECOG> 

(71) We, Standard Telephones 

and Cables Limited, a British Company, of 
STC House, 190 Strand, London, W.C.2, 
England, do hereby declare the invention, 

5 for which we pray that a patent may be gran- 
ted to us, and the method by which it is to 
be performed, to be particularly described in 
and by the following statement: — 

This invention relates to speech recognition 

10 apparatus and is particularly applicable to 
man/machine communication interfaces re- 
quired in, for example, the computer in- 
dustry. . 

The nature of speech is such that it lends 

15 itself to treatment in terms of binary features, 
at least in ciassicaL analysis. Difficulties arise 
because of difficulties in finding acoustic cor- 
relates of the classical distinctive features, or 
in defining any set of acoustic features which 

20 are sufficient for recognition. Even defining 
what is meant by 'sufficient' is not solved in 
any real sense. Generally speaking, such sets 
of binary features as have been defined are, 
moreover, far from statistically independent. 

25 In order to find out if a set of features is 
sufficient for recognition, the most practical 
approach for speech, with its high informa- 
tion content and considerable variability, is to 
adopt an empirical, statistical approach, and 

30 determine error rates. No recognition scheme 
will ever be perfect, because a real input can 
never be sufficiently precisely defined. There- 
fore, one cannot show that a recognition scheme 
will not work simply by producing an example 

35 of speech where it fails. A recognition scheme 
works if it performs up to some acceptable 
standard based on the statistics of its perform- 
ance. ; 

In the case of speech recognition certain 

40 basic elements are generally accepted as neces- 
sary. A pre-processor, which converts the 
acoustic signal into some form of data; a 
processor which selects and transforms the 
data into a form suitable for decision; and a 

45 classification process which is given a data 
pattern from the processor and classifies it, 
correctly or incorrectly, or rejects it. The aim 
may be to maximise the number of correct 



ITION APPARATUS 

classifications, or minimise the number of in- 
correct classifications. 50 

Since the most practical way to evaluate 
features is to test them in a recognition sys- 
tem, it is not unreasonable to select a really 
good classification procedure (one that, is 
optimum, simple, and well understood being 55 
ideal) and find out what its input requirements 
are. The processing sections are then defined 
in terms of input (acoustic signal), output (re- 
quired input for the decision process), and pur- 
pose (features, relevant to recognition, requir- 60 
ing to be detected). In view of the supposed 
'binary opposition' basis of speech perception, 
and die known optimality of the Maximum 
Likelihood Strategy (MLS) which can be 
realised for binary feature spaces, it (the MLS) 65 
is a prime candidate for the decision classifica- 
tion process. 

The maximum likelihood decision is a 
guaranteed optimum procedure, but is only 
solved for rather restricted cases: 70 

(i) where the probability distribution is in 

terms of a binary space of independent 
features, 

(ii) where the probability distribution is 
Gaussian, with equal co-variance 75 
matrices. 

According to the invention there is provided 
a speech recognition apparatus including means 
responsive to selected acoustic characteristics 
for decomposing a signal representing an 80 
acoustic input into analogue signals on parallel 
channels, each analogue signal being represen- 
tative of a different acoustic feature of the in- 
put, means for transforming the analogue sig- 
nals into binary signals on parallel channels, 85 
the binary signals constituting time ordered 
event markers relating to the occurrence or 
occurrences of the respective acoustic fea- 
tures represented by the analogue signals, 
means for generating further binary signals 90 
being time ordered event markers marking the 
occurrence or occurrences of specified sequen- 
ces of event markers relating to the occurrence 
of two or more different features in sped- 



fled sequences, and means for storing in a 
fixed predetermined sequence binary informa- 
tion representing both the content of the 
acoustic input in terms of the different indi- 

5 vidual acoustic features of the input and the 
content of the acoustic input in terms of the 
. specified sequences of acoustic features. 

In a preferred embodiment of the invention 
the apparatus further includes . means for 

10 determining the likelihood ratio of occurrence 
to non-occurrence of the constituents of the 
stored binary information in comparison with 
a reference pattern and means responsive to 
said ratio whereby a decision is made for 

15 accepting, rejecting or requesting a repeat of 
die acoustic input. 

There are two problems. The features must 
be statistically independent, and they need to 
be presented as a set of binary observations, 

20 rather than the time-varying set of signals 
produced by acoustic analysers. 

The latter problem appears under many 
guises, the common ones in speech being the 
'segmentation' problem or the 'time-normalisa- 

25 tion' problem. There are actually two varieties 
of time-dependent information in speech, a 
fact not commonly given explicit recognition. 
One type of information concerns the duration 
of events, and the other type concerns the order 

30 of events. While not wishing to make any 
claim that there is only one way of dealing 
with this type of duality, it is suggested that 
it clarifies one's thinking, and allows the core 
of the problems to be recognised more easily, 

35 if these two types of time information are 
thought of and handled separately. The further 
suggestion is made that the handling of neces- 
sary duration information is essentially part of 
the acoustic analysis. The output of the acous- 

40 tic analyser then consists of a set of data lines 
which carry data in the form of standard 
pulses. Each channel can, for example, be 
derived from a circuit responsive to seme 
particular characteristic cf speech determined 

45 from the acoustic analysis, and depending on 
duration and/or frequency cues. For a simple 
analysis scheme, examples might be high- 
frequency-energy present for more than time 
T a but less than T h ; high-frequency-energy 

50 present for more than time T b , no-significant- 
feature present for more than time T, but less 
than time T u ; no-significant-feature present 
for more than time T b . There is some evidence 
to suggest that this type of duration analysis, 

55 coupled with signals derived from four octave 
frequency bands, is sufficient to recognise the 
digits (Ross, P. W. 1967— limited vocabulary 
adaptive speech recognition system, presented 
at 23rd Convention of the Audio Engineering 

60 Soc. 16—19 October, 1967, for example). Two 
points should be .noted. First, the ternary 
manner of handling the information, resulting 
from a direshold (below which the event is 
ignored) and binary division of the noticeable 
65 event. 



Such a division is intuitively reasonable for 
similar reasons to those underlying the ternary 
proposal for handling spectral slope features 
(i.e. positive slope, negative slope, no signifi- 
cant slope), and amplitude features. The con- i0 
cept can also be applied to transition rates for 
formants, rates of rise and fall of the mean 
power envelope and the rate of change of slope. 
In some cases there is a division of a notice- 
able event into two magnitude categories, in 75 
other cases there is a division into two sign 
categories. Clearly in the second case it can 
also prove profitable to consider each sign 
category as two magnitude categories, if the 
magnitude has any significance. The second 80 
point to notice is that we have been talking 
about the acoustic analyser, and that the out- 
put consists of binary signal carrying lines, 
which in this illustration consist of related 
pairs. The first line would signal when die £5 
input signal terminated at T r <T b and the 
second when the input signal terminated at 
T x >T b . In another embodiment three out- 
put lines are provided, the third line carrying 
a signal saying that T a has been exceeded. 90 
If a 'dead' region were allowed then both 
lines would be on together when there was 
doubt as to which has occurred. Thus such 
methods convey duration and occurrence in- 
formation about a given feature in an economi- 95 
cal and usable way: i.e. it occurred, starting 
now; it was short, ending now; it was long, 
ending now. 

The above mentioned and other features of 
the invention and the manner.of attaining them l° u 
will become more apparent and the invention 
itself will be better understood by reference 
to the following description of an embodiment 
of the invention, taken in conjunction with the 
accompanying drawings, in which: — . lu:> 

Fig. 1 is a block schematic of the major 
sections of an automatic speech recognition 
apparatus; . 

Fig. 2 illustrates the effect of ume and level 
hysteresis in the determination of primitive HU 
acoustic features in speech; 

Fig. 3 illustrates a set of primitive acoustic 
events for a word, the compound acoustic 
events derived from them and a bit pattern 
associated with the word; 115 

Fig. 4 is a schematic of the significant 
constituents of one form cf acoustic analyser 
for Fig. I; 

Fig. 5 is a schematic of the significant con- 
stituents of a feature time-continuity filter with 120 
delay normalisation used in the arrangement 

°*Fig. 6 is a schematic of the significant con- 
stituents of a ternary event detector used in 
the arrangement of Fig. 4; 125 

Fig 7 illustrates the operation of a ternary 
event detector for the three possible cases of 
inout-pulse duration; 

Fig. 8 is a schematic of one form of control 
circuit for Fig. 1; 13J 
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Fig. 9 is a schematic of the significant con- 
stituents of one form of sequence detector used 
in the arrangement of Fig. 1; 

Fig. 10 is a schematic of the significant con- 
5 stituents of an elementary sequence event 
generator suitable for use in the arrangement 
of Fig. 4, and 

Fig. 11 is a schematic of the significant con- 
stituents of the decision logic used in the 

10 arrangement of Fig. 1. 

In the arrangement shown in Fig. 1 the 
speech is fed to an acoustic analyser 100. A 
• set of filters or other detection elements PAC1, 
PAC2 .... PACn decompose the speech into 

15 Primitive Acoustic Characteristics (PAC). 
Each PAC is reduced to a Primitive Acoustic 
Feature (PAF) by corresponding threshold 
devices PAF1 .... PAFn. These threshold 
devices each have a certain degree of hysteresis 

20 so that once a decision has been made regard- 
ing the PAF this decision is adhered to until 
there is good reason to change the decision. 
Such a decision represents the formation of 
a minimum 'null hypothesis' consistent with 

25 the incoming evidence and may consist of 'the 
feature is occurring 5 or 'the feature is not 
occurring'. The hypothesis is not abandoned 
until it is inconsistent with more recent evi- 
dence, rather than merely inadequately sup- 

30 ported. Widiout the ability to make and suck 
to such minimum hypotheses, the machine's 
ability to structure its input — an essential pre- 
liminary to making good decisions — is seriously 
handicapped. In physical terms the effect on 

35 signals is as illustrated in Figure 2A — H. At 
some level of evidence it is necessary to say 
a feature is present, and then stick to this 
decision until the evidence very definitely 
shows that the feature is absent. Consider an 

40 analogue signal containing a PAC the dura- 
tion of which, in real time, is from Tj to T 2 
(Fig. 2A), this signal, of course, being un- 
available directly. Two trigger levels tli and 
tl 2 are indicated (Fig. 2B) in relation to the 

45 analogue signal. The output signal when the 
lower trigger level tl t is considered without 
any hysterisis is one indicating the apparent 
occurrence of several PAF's of varying dura- 
tions (Fig. 2C). This output is misleading as, 

50 due to the nature of speech, there is for all 
practical purposes only one PAF of duration 
Tj — To. Incorporating time hysteresis in the 
circuit has the effect of eliminating some of 
the insignificant variations in the input signal. 

55 The hysteresis introduces a time delay r where 
r is the time for which the signal must be con- 
tinuously in one particular state for that state 
to occur as an output. It will be noted that 
a spurious pulse late in time, because of the 

60 hysteresis, incorrectly extends the output 
(Fig. 2D). 

If the trigger level is raised to tl. the effect 
of the same degree of hysteresis is to eliminate 
not only the spurious responses but also the 
65 correct responses (Fig. 2E). If a time hysteresis 



r is introduced the analogue signal is never of 
sufficient amplitude for long enough to give 
a significant output (Fig. 2F). Combining the 
two trigger levels without time hysteresis, with 
above d 2 being 'on' and below tl l being 'off' 70 
results in a form of amplitude hysteresis which 
eliminates the lesser of the insignificant fluc- 
tuations in the output (Fig. 2G). Introducing 
a time hysteresis r in the combined output 
eliminates the greater insignificant fluctuations 75 
resulting in a proper recognition of the PAF 
with only ?. time delay of r (Fig. 2H). It will 
be noted that both amplitude and time are 
involved in the hysteresis. This is necessary, at 
the practical level, first in order to make a 80 
reasonable representation of the input, and 
secondly to produce an output signal suitable 
for subsequent processing. 

The outputs of PAF1 — PAFn consist of 
signals indicating when certain important fea- 85 
tures of the speech signal are present and 
when they are absent. The content is specified 
by which lines are active, but the order is still 
implicit in their order of output. The order 
is difficult to specify because die signals over- 90 
lap. Before any detection of sequential charac- 
teristics is carried out, therefore, it is neces- 
sary to carry out a little more processing, 
namely to change an extended PAF into two 
events — 'primitive acoustic events' (PAE's) 95 
which are standard pulses marking the time 
when a decision is taken that the feature is 
present, and the time when it is decided that 
the feature is absent. There is a snag. In 
doing this we are, rightly, sorting into signals 100 
which can be ordered meaningfully, but at the 
same time we are consigning information about 
absolute duration of the PAF's to mere im- 
plication in terms of the order of events. This 
may be undesirable simply because the first 105 
event after a feature has begun may be that 
the feature has ended, and if the duration of 
such a feature is significant, then we have lost 
it, for at this stage we are interested in pro- 
cessing for order. The trouble arises because a 110 
distinction by absolute duration, such as that 
between a stop release (say for /t/) and a 
fricative (say /s/), depends on content and not 
order. Thus event detection must also take 
account of absolute duration, and in that way 115 
complete the extraction of content. An event 
detector will dierefore have one input, a 
PAF, and N-fl outputs, one marking the 
beginning of the PAF, and the others marking 
its end in each of N duration categories. Evi- 120 
dence suggests that N=2 is usual for English. 
In any case, if the duration of a PAF is 
ambiguous — it ends just on the boundary, or 
within half a standard pulse width — then, to 
avoid losing information, and perhaps for other 125 
reasons as well, the occurrence of both the 
possible events should be indicated. Thus, in 
continuous speech, a machine might need to 
consider a silence of ambiguous duration as 
either a stop-gap or an end-of-phrase gap. 130 
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At this stage %ve have reduced the original in- 
pucTa se°t of primitives-PAFs. .Resolution 
of the order is determined by the width of the 
standard pulse representing each such event. 
If two events overlap, we cannot assign pre- 
cedence between the pair concerned. ^How- 
ever, we may further process *e information 
to extract significant aspects of the sequence 
S events in* terms of so^ctural descnpton 
called 'compounded acoustic events (CAt sJ 
usine a grammar based method. . 

The determination of the order information 
is performed in the sequence *waj 
The PAE's are selectively applied m Element- 
al Sequence Elements ESE1, ESE2 



head and tail at this level. The tail is a 
primitive, and therefore is available as a set 
of PAE pulses marking the times at which the 
event B occurred. If the head were also avail- 
able, men the analyster could establish the 
times at which B occurred immediately arte, 
the head event, discounting all intervening 
events except those prohibited. If the head 
were not available (from previous determina- 
tion) then it would be treated as a new event 
to' be recognised and the process repeated. 
Eventually a level of recursion in the simula- 
tion would be reached at which only primitives 
or previously recognised events occupied the 
tops of the head and tail stacks that had built 



S. S XTl^?riS^-inpuS for' the ■ ^X^S^SZ^ generat- 
ESEn- Each has two primary inp , p, ^ rf event puls£S correspondmg to me 



Events" whose ' precedence is to be computed, 
and a third input into which prohibited 
events— 'seouence breakers'— are OR-ed 
20 together. One such element corresponds to one 
level of recursion of the equivalent computer 
analysins procedure. The equivalent function 
to Computer simulation is, of course, carried 
out by a single recursive subroutine. The 
25 dammar is that of a descriptive ■ ^guage for 
lercepts (in this case, words— which are audi- 
tory percepts). The language must describe, 
for example, pertinent aspects of the | ordering 
of the primitives. The necessary pattern ^ 
30 cription language may be simple, which would 
compensate for the relative complexity of the 
prSStives required for speech. There ,n an 
ovenvhelming gain in operational flexibility 
when the specification of sub : patterns, their 
35 Nations, e£, in terms of which .the pattern 
is to be analysed, is separated from the mech 
anism which does the analysis, for the speci- 
fication may easily be changed. Suo-patterns, 
which we may call 'compound acousuc events 
40 are denned in terms of sub-patterns and/or 
Primitives only, which is the same as saying 
CAE's are denned in terms of CAFs and/or 
PAE's only. Let us call both types of event 
simply 'events', where it is not confusing. 
45 There is only one relauonship funcuon— that 
of precedence. Thus sub-patterns or CAEs 
may be defined recursively without specifying 
property functions. The grammar is, however, 
context sensitive. It is necessary to specny, 
50 at each level of the recursions, sub-lists i of 
other objects which must not bear a prohibited 
relationship to the object denned at the level 
involved. In less abstract terms, mis amounts 
to a statement that one can specif y that certain 
55 other events must not intervene between the 
two events whose precedence funcuon is being 
evaluated. A chief advantage of recursive 
definition is that the structure of sub-patterns, 
or CAE's, is specified by their name. Thus, 
60 considering a computer simulation, die Utt 
«(A(BC))F)B) would be decomposed by; we 
analyster programme to a head ((A(BC)£/ 
and a tail B. The first sub-list of prohibited 
objects would tell the analyser tvhich events 
65 were rot allowed to intervene between me 
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up, ana uie pcuwuuit - = . 

ing sets of event pulses corresponding to me 
times of the various events on the head and 
tail lists, until the time marker pulses for the 
event originally specified were generated. Such 
a grammar-controlled analyser has been simu- 
lated on a computer for speech rccogruuon 
studies. For considering a hardware embodi- 
ment the specification of these ume markers is 
analogous to the identification of picture points 
associated with a pattern or sub-pattern lor 
a picture. The final ESE outputs are entered 
as binary information B, . . . • °>\ m J J* 
Pattern Register 300. The output of the 
sequence detector may be m several forms. 
(Note that the sub-patterns are synonymous 
with events as indicated above.) For example. 

(1) A bit pattern, each bit corresponding to 
W a particular CAE or, PAE, and set to 

'1' if the event in question was detec- 
ted. . , nf 

(2) A bit pattern represenung a set or 

'barometer* type counts (count=num- 
ber of bits set), each count representing 
the number of times a given event 
occurred. .... 

(3) A varying bit pattern, held in mono- 

stables whose period would be adjusted 
to the length of the longest period for 
which a given event might be signih- 
cant. 

Figure 3 illustrates a set of primitive events, 
the compounds derived assummg no other 
event is allowed to intervene (thus all events 
may be said to be 'sequence breakers ), and the 115 
bit patterns derived according to output 

™Thus the original set of time-varying ana- 
logue signals may, in the manner suggested, 
and as illustrated with reference to a com- 
puter based grammar, be translated into a se. 
of non-ordered, binary features, using out- 
put form (1). Decision logic 400 is organ ised 
on the simple basis of bit-for-bit matching 
with patterns stored as plugs in a three layer 
matrix board. These allow presence, absence, 
or don't care conditions to be specified, tli. 
latter condition obtaining when a plug cot- 
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firin* A T pulse is therefore produced at the 
output of sate 24 and, the output of gate 27 
being normally <0', the consequent 0 pulse 
from gate 25 and T pulse from gate 2o lead 

5 to the output of a standard pulse, width t, 
from monostable 115 and drive circuit 116. 

If the input ends during the period ot 
ambiguity (determined by the setting both of 
the first and of the second variable mcno- 

10 stables 114, 117) it will have continued past 
the end of the firing period of the first van- 
able monostable 114. Gate 30 will, therefore, 
have received two simultaneous '0 inputs, 
giving rise to a 'V pulse at its output, start- 

15 \n* at the end of the period of first variable 
monostable 114, and finishing at the end of 
the input signal. This € V pulse is inverted by 
<rate 31, and the trailing edge thus triggers the 
fixed monostable 118 leading to the produc- 

20 tion of a standard pulse marking the end ol 
the input at the output of drive circuit 119. 
At the same time the leading edge of the pulse 
from gate 30 triggers the second variable 
monostable 117, period d, and for this period 

25 of time a c 0' is present at the output of gate 
?8 Thus if the input stops before the expira- 
tion of d (i.e. the input stops during the period 
designated as ambiguous), gate 27 will have 
two simultaneous f 0' inputs, and a 1 pulse 

30 will appear on the output, starting at the end 
of the input, and ending at the expiration of 
d. This '1' pulse, acting on gate 25, produces 
a '0' pulse at the input to gate 26 and hence 
a standard pulse from the output of mono- 

35 stable 115 marking the end of the input (since 
the output for gate 24 has remained 0 ). Thus, 
an input pulse which is ambiguously close in 
duration to the nominal duration t* will pro- 
duce a standard pulse both from monostable 

40 115 and monostable 118, as required. 

Finally, if the input ends after the second 
monostable 117 has ceased firing, then only 
the standard pulse from monostable 118 will 
be produced. 

45 It is seen, therefore, that monostable 112 
produces a pulse each time the input PAF 
starts, monostable 115 produces a pulse if the 
input PAF lasts less than t b ; monostable 11*> 
produces a pulse if the input PAF lasts longer 

<50 than t b ; and pulses appear simultaneously 
from monostables 115 and 118 if the input 
PAF duration is ambiguously close to t b . In 
this manner PAF's are transformed into 
PAE's 

55 To return to Figure 4. A freeze level is 
brought into gates 7, 8, 9, 10, 11, 12 to inhibit 
the production of PAE's when the machine is 
frozen (and hence in the output cycle). For 
the 'gap' channel, we cannot inhibit at the 

60 PAF level, because silence will be present at 
the time of freezing, and a spurious 'ena of 
long gap' and subsequently 'beginning of gap 
will be produced. Therefore freezing at this 
level is effected at the PAE, with the i three 

65 TED4 outputs being inverted in gates 1:>, 14, 



15 and then applied to gates 10, 11, 12 to- 
gether with the freeze leveL 

It is convenient to consider the controller 
next. Whenever a 'beginning of gap 1 signal 
occurs, i.e. from gate 10, Fig. 4, .the end-of 70 
word integrating one-shot 501, Fig. 8 starts 
timing. 

If no PAE from gate 11 occurs between 
the last PAE from gate 10 and the expiration 
of the period of the integrating one-shot, then 75 
gate 36 receives two simultaneous '0* inputs 
and the output goes to '1* starting at the 
instant that the monostable period expires. 
This triggers the display monostable 502, 
and the leading edge of the output sets the 60 
control bistable 503, which, in turn, sets the 
freeze level via the freeze drive 504 to f l', 
freezing the machine. 

Note that the PAE from gate 12 line is 
taken to the start bistable 505. The first PAE 65 
produced for any word, if the beginning of 
the word is not missed, must be a PAE from 
gate 12 'end of long gap'. If this is not so, 
either too much noise preceded the word, or 
the speaker started speaking before the 90 
machine 'unfroze* from the last operation. 
Thus, if the start bistable is still in the reset 
condition when the machine freezes, some 
sort of error has occurred. The start bistable 
levels are therefore used to inhibit the com- , "5 
puting indicator drive, and the output level 
drive, when in the reset condition, and to 
allow the ready or error indicators to be 
driven depending on the state* of the control 
bistable 503 : when in the set condition it in- 100 
hibits the error and ready indicator drives, 

and depending on the state of the control 

bistable — allows the computing indicator or 
output level and indicator to be driven. 

Continuing now from the last paragraph 105 
but one. When the machine is frozen, depend- 
ing on whether or not a valid start was 
obtained, either the output level will also 
appear, or an error indication will be made, 
and output suppressed. The machine stays 110 
frozen in the output cycle until the display 
monostable period expires. Gate 37 inverts 
the output of the display monostable so that 
the trailing edges fires the reset monostable 
506. If the switch following 506 is set to 115 
'auto 5 the output of the reset monostable 
produces a reset level via the reset drive 507 
which clears the control and start bistables 
503, 505. It is also taken to other parts of 
the machine to clear the memory and out- 120 
put stores of the sequence detector. Thus 
the reset level puts the machine in the^ready 
state, cleared for action, and 'unfrozen*. 

The switch is provided to inhibit resetting, 
and to allow manual resetting, if desired. The 125 
outputs of 503 and 505 are gated in gates 32, 
33 34 and 35 to obtain the required ready, 
'error', 'computing' and 'output' indicator 
signals. The output signal is derived via an 
output drive circuit 508. 130 



7 



1,255,834 



7 



The sequence detector 200, use$ ESE's 
(Elementary Sequence Elements) to carry out 
first order Sequence Detection on the basis 
of selected PAE's. Each of the ESE's has two 
5 main inputs, and two auxiliary inputs. The 
operation may be described with reference to 
Figure 10. 

The purpose of the ESE is to produce a 
standard pulse out when the two main inputs 

10 are sequentially activated. If we call the two 
main inputs i and j, then one output gives a 
pulse when j occurs, following i, and the 
other gives a pulse out when i occurs, fol- 
lowing j. We may designate these pulses 

15 ije a and tt e n - They are standard pulses, of 
duration t microseconds. If the two main in- 
puts overlap, or if the input labelled S/B, 
for Sequence Breaker, is activated between 
the occurence of one main input and the 

20 other, then no output occurs; the device is, 
instead, reset appropriately. The occurrence 
of either main input is 'remembered^ if either 
persists, by itself, after the other inputs have 
stopped. The device is symmetrical with 

25 respect to the two inputs and outputs, so the 
operation of half will now be detailed. 

If a pulse appears* at i, and no other in- 
put, then the bistable, comprised of gates 40 
and 41, is set, and a '0' appears on the top 

30 input to gate 42. If a pulse then appears at 
j. and no other input, the output of gate 44 
falls to zero, during the pulse, and a '1' pulse 
is therefore produced from gate 42, which 
momentarily has four simultaneous '0' inputs, 

35 presuming the other two inputs are at c 0\ 
This T pulse causes the monostable 202 to 
fire, and an output pulse is produced via 
drive circuit 203. The output signal also is 
fed back to clear the memory bistable, the 

40 additional connection to gate 39 ensuring that 
there is no ambiguity in the resetting opera- 
tion due to i becoming active again. The 
device is then ready to register another 
'Elementary Sequence*. 

45 Gate 43 produces a *V output if both i and 
j are present at the same time. This prevents 
either output being activated by either in- 
put/bistable combination by inhibiting gate 
42, and also resets the memory bistables — 

50 ambiguity being prevented by the cross- 
connections from i to gate 45 and j to gate 
39. Whichever input lasts longest will even- 
tually be remembered, as is appropriate. 
If a Sequence Breaker occurs, then, again, 

55 the memory bistables are positively reset, and 
the output is inhibited by the connections to 
gates 42 and 48. Thus a Sequence Breaker 
occurring in the middle of an Elementary 
Sequence does 'break the sequence 5 . 

60 The input marked R/S, for Reset, is acti- 
vated by the reset level generated by the 
controller, and simply clears the memory 
bistables ready for another operation. There 
is no problem of conflict with other signals, 

65 since the machine is frozen at the instant of 



resetting, though it could provide additional 
protection to insert a slight delay in the re- 
setting of the control bistable, to make sure 
the machine is positively reset before it is 
'unfrozen'. The final outputs of some if not 70 
all the ESE's are entered directly into the 
Bit Pattern Register 300, Fig. 9 also shown in 
Fig. 1. 

The final section of the machine, the 
Decision taker 400, Fig. 1, is straightforward 75 
gating logic as shown in Fig. 11. The matrix 
may be, in practice, a three laj'er plug-board. 
The strips in one layer of such a plug-board 
matrix may be shorted to the strips in either 
of the other two layers, which strips arc 80 
arranged at right-angles to the strip of the 
first layer. By putting in suitable plugs a 
pattern of input states may be selected for 
each desired output, so that the output only 
comes on when the specified inputs are in 85 
the specified states — combinations of '1' and 
'0'. Lamps and drivers are provided to allow 
the operation to be monitored, and to allow 
the matrix rows and outputs to be driven. 

The decision taker is thus a straight pattern 90 
matching arrangement. 

WHAT WE CLAIM IS: — 

1. Speech recognition apparatus including 
means resposive to selected acoustic charac- 
teristics for decomposing a signal represent- 95 
ing an acoustic input into analogue signals 

on parallel channels, each analogue signal 
being representative of a different acoustic 
feature of the input, means for transforming 
the analogue signals into binary signals on 100 
parallel channels, the binary signals constitut- 
ing time ordered event markers relating to the 
occurence or occurrences of the respective 
acoustic features represented by the analogue 
signals, means for generating further binary 105 
signals being time ordered event markers 
marking the occurrence or occurrences of 
specified sequences of event markers relating 
to the occurrences of two or more different 
features in specified sequences, and means for 110 
storing in a fixed predetermined sequence 
binary information representing both the con- 
tent of the acoustic input in terms of the 
different individual acoustic features of the 
input and the content of the acoustic input 115 
in terms of the specified sequences of acous- 
tic features. 

2. Apparatus according to claim 1 in which 
the means for decomposing the signal rep- 
resenting an acoustic input includes a plur- 120 
ality of filters each arranged to pass a different 
range of frequencies and means for produc- 
ing from the filters a plurality of outputs each 
indicating the relative amplitudes of one of 

the filter outputs with respect to another filter 125 
output. 

3. Apparatus according to claim 1 or 2 
including means for detecting the total energy 
in the input and means for producing from 
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the total energy detector an oumut indicating 
that the total energy exceeds a predetermined 
threshold level. ; 

4 Apparatus according to claim 3 as 
appended to claim 2 in which the means for 
producing the outputs indicative of the rela- 
tive amplitudes of the filter outputs each 

' include trigger means having a first threshold 
level whereby the output is not inhibited 
when the ratio of one filter output amplitude 
to another filter output amplitude exceeds 
the first threshold and a second lower d*? 3 *- 
old level whereby the output is inhibited 
when the ratio falls below the second thresh- 
old level. . . . i j 

5 Apparatus according to claim 4 includ- 
ing means for producing an output when 
there is a balance between two filter outputs. 

6 Apparatus according to claim 5 includ- 
ing means for inhibiting the balance output 
when the total energy does not exceed the 
predetermined threshold level. _ 

7 Apparatus according to claim 4, 3 or 
6 including a plurality of pulse generators 
to each of which is applied one of the outputs 
derived from the filters and the total energy 
detector, each pulse generator being arranged 
to produce an output pulse the Stan of which 
occurs only when the input to the pulse 
generator has been present conunuously for 
a predetermined period of time and the end 
of which occurs only when the input has been 
absent continuously for a predetermined 
period of time. _ , 

8. Apparatus according to claim 7 wherein 
each pulse generator includes a pair of one- 
shot multivibrators each of which delivers an 
output pulse which lasts for a predetermined 
period of time after an input has been 
applied, means for triggering one of the 
multivibrators from the leading edge of an 
input signal derived from a filter or total 
energy detector, means for triggering the 
other multivibrator from the trailing edge of 
the input signal, and gating means for gating 
the input signal with die pulses generated by 
the two multivibrators whereby the gated 
input signal forms the output of the pulse 

ge 9 er Apparatus according to claim 8 includ- 
ing means for normalising the delays occur- 
ring in the outputs of a plurality of pulse 
&e °a a Apparatus accord j ng to d a j m 8 or 9 



including a plurality of means for generating 
binary information signals each producing an 
output according to the significance and 
duration of the output of one of the pulse 
generators. in . 

11 Apparatus according to claim 1U in- 
cluding a plurality of gating logic means each 
of which is responsive to two or more binary 
input signals whereby the relative sequential 
occurrence of those signals can be determined 
and means for generating a binary output 
signal according to the relative sequential 
occurrence of the binary input signals. 

12. Apparatus according to claim 11 where- 
in the binary input signals for some of the 
mating logic means are those derived from 
the pulse generators and the binary output 
signals of some of the gating logic means 
fonn the binary input signals to other gating 
logic means. t . 

13. Apparatus according to claim 11 or 
including means for storing in a predeter- 
mined sequence the binary output signals 
from some of the gating logic means, and 
means for comparing the stored information 
pattern with predetermined binary informa- 
tion patterns. 

14 Apparatus according to claim 13 
wherein the means for storing the binary out- 
put signals includes one or more monostables. 

15. Apparatus according to claim 13 or 
14 including means for determining the like- 
lihood ratio of occurrence to non-occurrence 
of the constituents of the stored binary in- 
formation in comparison with a reference 
pattern and means responsive to said ratio 
whereby a decision is made for accepting, 
rejecting or requesting a repeat of the acoustic 

mI lo\ Apparatus according to any one of the 
preceding claims 10 to 15 including means 
for freezing the operation of or output from 
each of the plurality of means for generating 
binary information signals indicating the 
significance and duration of outputs of the 
pulse generators. 

17 Speech recognition apparatus substan- 
tially as described with reference to the 
accompanj'ing drawings. 

G. H. EDMUNDS, 
Chartered Patent Agent, 
For the Applicants. 
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